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1 Introduction 

NML[21 cores, i.e. optiinized NML modules/macros to implement a certain operation, function or task, 
can be embedded in theCsource code of agivenapphcatioa. Figure 1 shows the compilation flow when 
NML modules are used in a C program. Two types of integration schemes are currently supported by 
the XPP-VC compiler [1]. One refers to the integration of NML modules that release its resources after 
computing and interface to the C program via array variables (memories). The other approach refers to 
NML modules, which do not release its resources and interlace to the C program via scalar variables. 
The name of each NML module must start with »XPP_» and there must exist an NML file with die same 
name where the module is defined. A C header file r >, where each module's interlace is declared, 
can be used (other way is to specify the interface declaration in the C program). Internal memories 
used by the NML module must be pre-placed and the placement information must be declaredmihe 
interface declaration. Special pragmas are used to declare=the positions of the memories on the XPP [3J. 
Table I shows the pragmas supported and Table 2 shows some pragmas that wul be considered m future 
improvements. The inWe specification between the C code and the NML module mu^t be preceded 

byfcepragrrtaidentifymgte^ -fSf^L 
Memories used only on the scope of the module must be also declared using #pragma IRAM <x>, 

<y> without a name. 
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Reference 


syntax 


Comments | Current Support 


NML modules 


tfpragma MODULE 
^name'^ 


define that name refers to 
an NML module or to an 
instance of an NML mod- 
ule ~ 


yes 


of NML modules 


Tfpiagxua. piacc ^ name z> 
<X>,<Y> 


manually placement of an 
instance of an NML mod- 
ule. When the posi- 
tion specified refers to 
an IRAM the compiler 
marics such IRAM as used 
by the current configura- 
tion. 

The name is not neede 
when the nrazma is used 
after the module invoca- 
tion. 


yes 


NML modules 


#pragma inline <"name 4l > 


instructs the compiler to 
instantiate die NML mod- 
ule without creating a spe- 
cial configuration for this 
module 


yes (can be used in 
the first approach) 


array variables In 
internal memories 


^pragma IRAM 
<X>,<Y> <array name> 


"X" and "Y u define the 
IRAM used to accomo- 
date the array 


yes 


internal memories 
used by the NML 
core internally 


#pragma IRAM 
<X>,<Y> 


X and 'T define the 
IRAM used 


yes 


constants 


tfpragma CONST 
<value> 


the r £ider of the declara- 
tionraust be the order of 
the constants in the NML 
module. 


yes 
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NML Integration f 




Figure 1: Compilation flow when integrating NML modules in C programs. 



Table 2: Pragmas used to specify the interface between NML modules and C code (cone). 



Reference 


syntax 


Comments 


Current Support 


array variables in 
external memories 


#pragma EXRAM <Z> 
<array name> <base ad- 
dressy <si2e> 


"Z 4 * identifies the I/O port 
used to interface to the ex- 
ternal RAM where the ar- 
ray is located 


no (requires XPP- 
VC support to 
specify the base 
address of each 
array mapped to an 
external memory) 


external memories 
used by the NML 
core internally 


#pragma EXRAM <Z> 
<base address>. <size> 


"Z" identifies the I/O port 
used to interface to the ex- 
ternal RAM 


no (requires XPP- 
VC support to 
specify the base 
address of each 
array mapped to an 
external memory) 


I/O ports 


^pragma IN I OUT 1 IN- 
OUT<Z><name> 


declare an I/O port at 
position 4< Z" as input 
(IN),, output (OUT), or 
inputfoutput (INOUT). 
Assign a name to that I/O 
port. 


no (should be spec- 
ified in the inter- 
face, but not used 
by XPP-VC. Used 
as a form of docu- 
mentation) 
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2 ModMes mat Cosmected to ftSa© MML Sflrafiteires Gesnorattedl bj XFF-VC 

To instantiate a module the user must use the module's name like a nonnal C function call. The declara- 
tion of the module is done as a normal C function plus the pre-defined C pragmas to specify the interface. 
For each array variable in the function's argument list mapped to IRAMs, a declaration of the location of 
the XRAM that contains it must be presented using #pragxna I RAM <x>, <y> <array name>. 

In this approach, the compiler only supports as arguments ajray variables. However, constants (which 
can be used to communicase parameterizable features: identification of an I/O port, for instance) can be 
specified using the pragma CONST. For internal RAMs, we assume a one-to-one mapping (each internal 
memory TRAM contains a single array variable). The place and CONST pragmas can be used after 
the invocation of the module in the C program. In this case the programmer does not need to use the 
"MODULE pragma (see Figure 5)T * 

Each module called in the C code is automatically embedded in the NML output file. By default, the 
compiler generates one configuration for each call. If instructed by die inline pragma NML modules 
can be embedded in the same configuration in conjunction with structures generated from C code (note 
that this option can only be used with independent modules, which must be also independent from the C 
segments of code existent in the same configuration and it is a scheme to include concurrent tasks in the 
same configuration 1 ). Each module used in the C code mast self-release its resources after completion 
of computation. Each module must have only one configuration. Integration of NML applications with 
more than one configuration into C code must be explicitly done by integrating each of the modules 
(configurations) individually. 

Consider the DCT algorithm shown in Figure 2. It consists of two 8x8 matrix multiplications. As- 
suming the existence of an optimized NML module to perform the multiplication of two square matrix* 
the user can re-program the algorithm using the optimized module (see source code in Figure 3). The 
XPP.jaext^conf in comments in Figure 3 illustrates the configuration boundaries that will be auto- 
matically inserted by the compiler to furnish one configuration for each module invocation. The interface 
declaration for the XPP_jnat_mul module can be seen in Figure 4. Figure 4(a) shows the definition of 
N (number of rows or columns of the matrix) which is used to parameterize the NML module (see Figure 
4(b))- 

In this example, the XPP memory resources are statically pre-defined for each NML module- Thus, to 
transfer different array variables to distinct instances the user must explicitly copy array elements to the 
array variables that will be used to communicate data between the program and die NML module (see 
Figure 3). Thus, all instances in the code of a specific NML module with memories in fixed positions 
must use the same list of array variables as argument's list 

Another scheme is the use of parameterized memory positions. In this case, for each invocation of the 
NML module in the C code, the memory positions must tie defined according to the initial positions of 
the internal memories for each array variable (Figure 5 shows a OCT example using a parameterized 
module to do the matrix multiplication). In this case no movement/relocation of data between internal 
memories is necessary. 

Initialization of asray variables in the declarative section dS the C program (e.g.. intQ a = { 1, 4 9 1);) 
and used by an NML module is inserted in the first configuration of the application. However examples 

1 Note ttiot it muse be ensured that the some PAE is not used by different independent designs in the same module. 
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// file dct.c 



// do inim x CosBlock(T) 
for(i=0; i<N;i-M-) { 
for(j=0;3<W7j-H-) ( 
txnp ■ 0; 

for(ks0;k<CT;k++) 

trap +*= inX«ili*N+k] * CosBlock[k+j*N] ; 

TexftpBioekti-N+j ] « trap; 

) 

) 

//do CosBlock X Ten^JBlock 
£or(i=0; i<tf;i++> { 

for(j=0;jiN/jj++L v 
tfflp = 0; 

for ( k=0 ; k*CN ; k++ ) 

tmp += TetnpBXock(k*M+j] * COS81ock[i*N+k] ; 
OutlmCi*N+j]= trap; 

) 

) 



Figure 2: C code for a DCT implementation based on matrix multiplications. 

where the NML module is the first configuration in the application require xmap support (this feature is 
planned). 

3 Modules Connected to the NML Structures Generated by XPP-VC 

Another possibility is to embed and interconnect NML modules with the NML generated by the XPP-VC 
compiler. In this case there can exist interconnections between scalar variables of the C code and ports 
of the NML modules. This is done by using C structs (each struct must have a field with the same name 
as the related NML module, which must start with "XPP JO to define each NML module and two special 
functions: XPP _getmacro and XPP_putoacro. They are used to connect variables of the C code to 
the inputfoutput ports of the NML module. 

Figure 6 shows a segment of code using an NML module to do integer division ("XPP_div"). Figure 7 
shows the header of the XPP_div module and a segment of the NML code generated by the compiler 
using an instance of that module. Figure 8 shows how to share the same instance of an NML module and 
Figure 9 shows how to use more than one instance of the same NML module. 
Figure 10 presents another example: a C program using XPP FIFOs. 

Note that the interface declarations aedbuted to an NML. module instance override previous possible 
declarations atributed to the name of the NML module. 

When it is not possible to synchronize NML module instances with their I/O port connections in a 
dataflow scheme, an end of completion signal can be used (since it is not possible to declare event 
signals in C, an integer type must be used): 
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^include *XPPlih.h" 

• * a 

const int CosBlock[M] » {...},- 

const int CosTrans[M] = {..„}; // the transpose of CosBlock 
// XPP_jiext_conf {); 

X5Pjiatjault(in2a f CosTrans, TempBlock)* //matrix multiplica- 
tion in NML 
// XPP_nLext_conf () ; 

// copy the values to the arrays used as arguments of XPP mat mult 
£or(i»0*x<Mj i 

inlmCiJ a CosBloc3c[i] ; 
• CosTrans-Cil = TempBlocJcti] 
} - - - 

// XPP_next_con£(> ; 

XPP jaat^amlt ( XnXm, CoBTarana, TempBlocfc) ; //matrix multiplica- 
tion in NML 
// XPFjaexi:_conf ( ) ; 



Figure 3: A DCT implementation using NML modules to perform the matrix multiplications. Each NML 
module will use a different configuration. 



// file XPM.ib.Hs (a) 

• • m 

// The multiplication of two quadratic matrix in NMli, function; 
void XPPjftatjmiltdtiC A£3, in* B[J, iM CCJJ* 

// The specification of the arguments of the NML module 
#pragma MODULE " l XPP J ^cat - jiiult* // identify the module names us- 
ing B <name>'' 

#pragma IRAM 1,0 A // I RAM <x>,<Y> <array aame> * 
^pragma IRftH 1,1 B 
#pragma XRm 1,2 C 

^pragma CONST Q //number of elements in each row and col- 
umn of the square matrix 
// other modules: 



// file dct.nml: (n) 

INCLUDE ° XPPjna t^jmul t * nml * 

■ • m 

MODULE MOD2 { 

OBJ ols XPPjmatjnult t8] { } 

} 



Figure 4: (a) Interface definition for the NML module: XPPjnatjnult ; (b) segment of the NML file 
generated by the XPP-VC related to the second configuration. 
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#include "XPPlib.h- 

canst inc CosBlocklM] = {...}; 

const ixit CosTranstM] » {--.}; // the transpose of CosBlocfc 
» « • 

3Bg jatjmlt ( TnTm, Cos^xans, TempBloOc) ; //matrix multiplica- 
tion in NHL 
#pragma place 0,0 

#pragma CONST 1 //X position gf memory with Xnlm 
♦pragma CONST 0 //Y position of memory with inim 
#pragma CONST 1 //X position of memory with GosTrans 
♦pragma CONST 1 //Y position of memory with CosTrans 
tpragma. CONST 1 //X position of memory with TempBlock 
#pragma CONST 2 //Y position of memory with TempBlock 
^pragma CONST *T //number of row and column elements 
XPP - jaafc_jmilfc(CosBlock, TempBlocle, Outlm) ; //matrix multiplica- 
tion in nmx* 
#pragma place 0,0 

#pragma CONST 1 //X position of memory with CosBlock 
#pragma CONST 3 //Y position of memory with CosBlock 
#pragxna const 1 //X position of memory with TempBlock 
#pragma CONST 2 //Y position of memory with TempBlock. 
tpragma CONST 1 //X position of memory with Outlm 
#pragma CONST 0 //Y position of memory with Outlm 
#pragma CONST 8 //number of row and column elements 



Figure 5: DCT example using NML modules to perform the matrix multiplications. Each NML module 
will be mapped to a different configuration. In this case an NML module with parameterized memory 
positions is used. 

int endUmod; 

XP?jutmacro (modi .a, a); 
do { 

XPP^getmacro (modi . end, &endjnod) ; 
} while (end^mod == 0) ; 



In the example above, a connection is done between the C variable a and the port a of the instance modi 
of an NML module. After that the program wait for the; completion of the execution of the instance 
which is signaled by a value different of zero in the output port end of the NML module instance. 

References 

[1] Joao M. P. Cardoso, and M. Weinhaidt, "XPP-VC: A C Compiler with Temporal Partitioning for the 
PACT-XPP Architecture," In Prvc* 12th International Conference on Field Programmable Logic and 
Applications (FPL'02) 9 LNCS, Springer-Verlag, 2002. 
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// original- C code (file arraydiv,.c) s (a) 

for(i=0;i<M; i++) £ 
C(i] = Ati]/B[i],- 

) 

• " .... 



// C cade using 1 an NHL module to perform the divisions (b) 
# include *XFPlib.h # 

divMOD divl; 

for(i«0;±<M; i*+) { 
ai = Mi] 7 
toi = B(i]/ 

xp^jmtaacro (divl .a, ai ) ; 
XPP^pufcmacxo ( divl . b , hi); 
XPP_ffotmac^o ( divl , C , &ci) ; 
CCi] ^ ci; 

> 



// file XPPlib.fca (C> 

// The declaration of the nme, nodule to do integer division 
// XPP_div.nxnl is the name of the NML file where the module 
// XPP_div is defined 
// computes c » a/b 
typedef struct { 

int XPP^div //indentifies the name of the NML module 

int a, b, cj 
} divMOD,- 

#pragroa MODULE *XPP_div" 

#pragma XBAW 1,0 //it uses an internal RAM in position X-1,Y=0 



Figure 6: Example of embedding an NML module, which performs integer division, into C code: (a) 
example in C; (b) the same example in stylized C; (c) the description of the NML module in the library. 
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// file XPP.div.xaals (a> 

» • « 

// The integer division in NML: 
MODULE XPF_div{DIM a, DIN b, DOUT C) { 

// NML code to perform the division 

• • « 

> 



// file arrayaiv. tails (e> 

* • a 

INCLUDE *XPP_div.nml- 
• « • 

MODULE ex { _ 

// NMX» code generated by the XPP- 
VC to interface to the NML module 
OBJ divl: NML_div {) 

divl.a = <objecfc generated by XPP-VO«X 
divl.b = <objecfc generated by XPP-UO.X 
<object generated by XPS-VC>-<A | B> = divide 
■ ■ ■ 



Figure 7: Example of embedding an NML module, which performs integer division, into C code (cent): 
(d) the NML module; (e) the NML module integrated in die NML code of the design. 

[2] PACT XPP Technologies, Inc., Germany, "NML Reference: Release 2.0," Technical Report, April. 
• 2001. 

P] PACT XPP Technologies, Inc., Germany, "The XPP White Paper: A Technical Perspective," Tech- 
nical Report, March, 2002. 
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// original C coda (file arraydivl.c) 2 (a> 

for<i=0;i<M; i+=2) ( 
c[ij = A[i]/J&'[il;.. 
C[i+1] = ACi+U/Bti+l); 

} 



// C code using a shared MML modula to perform botb divisions: (b) 
#iaclude *xmib.h" 

divMOD divl; 

for(i=0;i<M; i++) { 
ai = A[i],- 
bi a B[i]; 

XPP jrufcmacro ( divl . a , ai); 
»F_putaa*cro(divl.b, bi) ; 
»PP_ {ye tanacro ( divl . c , &ci ) ; 
Cti] = ci; 
ai =« A[i*l] ; 
bi « B[i+l] r - 

xpp_putanacro (divl .a, ai ) ; 
XPP_pu tanacro (divl . b, bi); 
XPP_sretaiacro (divl . c , &ci) ; 
Cti+1] = ci; 

) 



Figure 8: Example embedding more than one NML module instance, which performs integer division, 
into C code: (a) example in C, which uses two dividers; (b) the example using one NML instance of the 
divider to perform the two divisions. 
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// C cods using two instances o£ tha MML DIVTBEK: (c) 
#include w XPPlib.h* 

divMOD divl; 

#pragma place "divl* 0,0 
pragma MODULE "divl" 

^pragma IRMI 1,1 //the tRAM used when the module is placed in 0, 0 
divMOD div2; 

tpragma place n div2 " 0,6 
^pragma MODUI/S •div2 - 

#pragma IRAK 1,7 //the IRAM used when the module is placed in 0, 6 

for(i=0;i<M; i++) { 
ai = Mil; 
bi - B[ij; 

2PP_jputaacro (divl , a , ai) ; 
XFP_xmtmacaro(divl.b, bi) f 
XPP_jgatmacxo ( divl • c , &ci ) ; 
CCi] = ci; 
ai = A[i+1]; 
bi » B[i+1]? 

XPP putmttcro ( di v2 . a , ai) ; 
X?P w putaacso(div2.b, bi) ; 
XPP aetanacro { div2 . c , &ci ) ; 

C[i+U « ci; 

) 



Figure 9: Example embedding more than one NML module instance, which performs integer division, 
into C code (corn.): (c) the example using two NML instances of the divides 
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// G coda using XPP FIFOs (file ex.c) s (a) 
#i»clude *XPPlib.h- 
■ * « 

FIFO fifol; 

#pragma place *fifol* 1,G //place the £ifo in position (1/0) 
FIFO fifo2/ 

#pragjna place w fifo2* 1,1 //place the fifo in position (1,1) 
ma±n() ( 

int inputLsainpl£_real , delay_value.JL*..delay_value_2; 
int fiound_flag ■ 0; 
int zero_couater =0; 

while (1) { 

XPP^etstreeund, 0, &input_sample_real) ; 
xpp_putmaero ( f ifol . in, input^saxnple_real) ; 
XPP— getnacro ( f if ol . out , &delay_value_l ) ,- 
XPP^putmacro ( f if o2 . in, delay_valuel ) ; 
XPP^sretmacTO ( f i f o2 . out , &delay_value_2 > ; 

i£{ (input^sample^real delay_yalue_l + 
delay_value_l - delay_value_2 ) 0) { 
zero_counter++ ; 
if (zero_ counter ==64) ( 
founcL£lagsi; 
zero_couater = 67 

} 

> 

XPP_puts treaxn ( 4 , 0 , f ouncL.f lag) ; 

) 



// file XPPlib.h; (b) 

// The declaration of the XPP FIFO: 
typedef struct { 

int XPPJFIFO // identifies the name of the tfML module 

int in, out? 
) FIFO; 



Figure 10: Example of a C program using an XPP FIFO; (a) example in C; (b) the description of the 
NML module in the library. 



. imPfansszeit lO-Okt - 14:12 



^2 



11a m w • 



// £iX« 2CPP^IFO-aals (c) 

MODULE XPP_ FIFO (DIN in # DOUT OUt) { 

OBJ fifoX: FIFO © 0,0 { // relative position used 
in = in 

> 

out « f if OX, OUT 

} 



// £iXe ox.oml: <d) 

INCLUDE •XPP^FIFO.nral* 

OBJ fifol s XPPJFIFO d SI, SO { } 

fifol.in = <object generated by xpp-vc>.x 

< object generated by XPP-VC>.<A J B> * fifol. out 

OBJ fifo2 : XFP_FIFO © $1,$1 { } 

fifo2.in - <object generated by XPP-VO.X 

<object generated by XPF-VC>.<A | B> * fifo2.out 



Figure i 1: Example of a C program using an XEP FIFO (cbnt); (c) the FIFO in NML (relative position 
is used); (d) the section of the final ex.oml file with the FIFO's instantiations and connections. 
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